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Abstract. We present an efficient particle filtering algorithm for multiscale systems, that is 
adapted for simple atmospheric dynamics models which are inherently chaotic. Particle filters 
represent the posterior conditional distribution of the state variables by a collection of particles, 
which evolves and adapts recursively as new information becomes available. The difference 
between the estimated state and the true state of the system constitutes the error in specifying 
or forecasting the state, which is amplified in chaotic systems that have a number of positive 
Lyapunov exponents. The purpose of th e present paper is to show that the homogenization 
method developed in llmkeller et al] |2012l ]. which is applicable to high dimensional multi-scale 
filtering problems, along with important sampling and control methods can be used as a basic 
and flexible tool for the construction of the proposal density inherent in particle filtering. 
Finally, we apply the ge neral homogenized particle filtering algorithm developed here to the 
Lorenz'96 (|l orena [1995| ) atmospheric model that mimics mid-latitude atmospheric dynamics 
with microscopic convective processes. 

1. Introduction 

The main goal of filtering is to obtain, recursively in time, the best statistical estimate of 
a natural or physical system based on noisy partial observations of the same. More precisely, 
filtering problems consist of an unobservable signal process and an observation process that is 
a function of the signal corrupted by noise. This is given by the conditional distribution of the 
signal given the observation process. This paper deals with real time filtering of chaotic signals 
from atmospheric models involving many degrees of freedom. Since small errors in our estimate 
of the current state of a chaotic system can grow to have a major impact on the subsequent 
forecast, better estimate of the signal is needed to make accurate predictions of the future state. 
This is obviously a problem with significant importance in real time filtering and prediction 
of weather and climate which involves coupled atmosphere-ocean system as well as the spread 
of hazardous plumes or pollutants governed by extremely complex flows. A complete study of 
these complex systems with practical impact, involves models of extremely unstable, chaotic 
dynamical systems with several million degrees of freedom. 

Proper and accurate climate models can only be obtained by combining the models with 
data. Lower dimensional climate models require first the identification of slowly evolving climate 
modes and fast evolving non-climate modes. Contrary to standard initial value problems, we 
do not have access to the initial state of these dynamical systems. Instead, we have a known 
initial probability density function for the initial state and treat the model states as realizations 
of a random variable. Lower dimensional climate models also contain noise terms that account 
for the interaction of the resolved climate modes with the neglected non-climate modes. Hence 
these problems require efficient new algorithms to estimate the present and future state of 
the climate models, based upon corrupted, distorted, and possibly partial observations of the 
climate modes and fast evolving non-climate modes. While perfect determination of the state 
is impossible under these noisy observations, it may still be desirable to obtain probabilistic 
estimates of the state conditioned on the information that is available. 

The sheer number of calculations required in directly solving large-scale random dynamical 
systems becomes computationally overwhelming. Hence, we consider the Lorenz'96 model of 
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type II ( Lorend (l995 |) with two time-scale simplified ordinary differential equation describing 
advection, damping and forcing of some (slow) resolved atmospheric variables being coupled 
to some (fast) sub-scale variables as a nontrivial example of an atmospheric dynamics model. 
Though the model considered in this paper is a simple chaotic "toy model" of the atmosphere, 
which is an excellent test-bed for schemes that will be developed in Section it has a more 
realistic dimension than the multitude of "Lorenz 1963" -type studies (i.e., 360 dimensions rather 
than 3 dimensions). Even though this simple model is still a long way from the Primitive 
Equations that we need to eventually study, it is hoped that the filtering methods presented 
here can be ultimately adapted for the realistic models that need to be computationally solved 
for weather and climate predictions. 

In the past decade, vast amount of data has been produced via various sources from network- 
connected sensor arrays to satellites, all with a wide range of scale separations. To deal with 
such an onslaught of data, it is necessary to have a new framework capable of harnessing 
and processing these data with multiscale models. Hence, a data assimilation scheme that 
can handle chaotic systems that are sensitive to initial conditions (characterized by positive 
Lyapunov exponents) and noisy multiscale observations is needed. The theory of nonlinear 
filtering forms the framework in our study for the assimilation of data into multiscale models. 

Particle filters represent the posterior conditional distribution of the state variables by a col- 
lection of particles, which evolves and adapts recursively as new information becomes available. 
Hence, particle filtering provides a recursive procedure for estimating an evolving state from a 
noisy observation process. Despite the general applicability and rigorous convergence results, 
particle filters have not been extensively used in st ate estimation of hig h dimensional problems. 



for example, weather prediction, as pointed out in lSnvder et al.l [200a]. This is due to the fact 



that a large numb er of particles is required a r id the particle filters suffer from particle degeneracy 
(see, for example, Daum and Huana 2003l |. Snvder et al. 2008l |) in high dimensional systems. 



In this paper, we combine our study of stochastic dimensional reduction and nonlinear filtering 
to provide a rigorous framework for developing an algorithm for computing lower dimensional 
particle filters which are specifically adapted to the complexities of the underlying multiscale sig- 
nal. When the rates of change of different model variables differ by orders of magnitude, efficient 
data assimilation can be accomplished by constructing a particle filter approach for nonlinear 
filtering equations for the coarse-grained signal. In moderate dimensional problems, particle 
filters are an attractive alternative to numerical approximation of the stochastic partial differ- 
ential equations (SPDEs) by finite difference or finite element methods. Our approach described 
in this paper, not only reduces the computational burden for real time applications but also 
helps solve the problem of particle degeneracy. First objective is to predict the self-contained 
description of the coarse-grained dynamics without fully resolving the dynamics described in 
fast scales. In these problems, extracting coarse-grained dynamics is at the heart a problem 
of weak convergence of stochastic processes, or more precisely weak convergence of the laws of 
Markov processes. 

We begin in Section [2] by presenting the general formulation of the multiscale nonlinear filter- 



ing pr oblem. Here we introduce the homogenized equations that were derived in llmkeller et al 



2012l | for the reduced dimension unnormalized filter. In Section [3] we present the model that 



was originally suggested by Lorenz (Lorenz'96), which incorporated a pattern with convective 
scales and introduced a crude model of K slow variables plus JK fast variables that varies 
with two distinct time scales. This is an excellent test-bed for data assimilation schemes that 
are developed in this paper. Section [3] outlines a sequential particle filtering algorithm (the 
Sequential Importance Sampling method) and a numerical a lgorit h m for di n iensio nal reduction 
(the Heterogeneous Multiscale Method of IVanden-EiindenI [2003| |. Ie et"aD [2005^ ). These two 
methods are combined to give an efficient algorithm for particle filtering in a multiscale environ- 
ment. Finally, we present in Section [5l the results from several data assimilation experiments on 
the Lorenz'96 model and discuss future research directions based on "adaptive" or "targeted" 
observations and sensor placement. 
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2. Formulation of multiscale nonlinear filtering problems 



The results presented here are set within the context of slow-fast dynamical systems, where 
the rates of change of different variables differ by orders of magnitude. The effects of the mul- 
tiscale signal and observation proc esses via the study o f lower dimensional Zakai equations in 
a canonical way was presented in Imkeller et alj 2011 . 2012], where the convergence of the 
optimal filter to the homogenized filter is shown using backward stochastic differential equa- 
tions (BSDEs) and asymptotic techniques. This paper provided rigorous mathematical results 
that support the numerical algorithms based on the idea that stochastically averaged mod- 
els provide qualitatively useful results which are potentially helpful in developing inexpensive 
lower-dimensional filtering. For the reduced nonlinear model an appropriate form of particle 
filter can be a viable and useful scheme. Hence, we present the numerical solution of the lower 
dimensional stochastic partial differential equation derived here, as it is applied to several higher 
dimensional multiscale applications. 

We will present the main result of I Imkeller et al.l 2012l | here. Let (ri, J^, {J-'t),Q) be a filtered 
probability space that supports a standard Brownian motion {V,W,B) G -^kxixd^ consider 
the signal process that is the solution of the two time scale stochastic differential equations 
(SDKs) driven by iW,V): 



(la) 
(lb) 



dXf = b{Xt, Zt)dt + a{Xl ZDdVt 



t, 



dZf = -f{Xf, Zl)dt + -^g{Xl Zf)dWt, Z, 
£ ye 



Here, e << 1 is the timescale separation parameter, so X^ is the slow component and Z^ is 
the fast component. W , V and B are independent of each other as well as the random initial 
conditions ^ and r/. The functions /, g, b, a are assumed to be Borel-measurable. 

Associated to the signal is the d-dimensional observation that is perturbed by the Brow- 
nian motion B^ given by 



y/= [\{xi,zi)ds + Bt 

Jo 



where /i is a Borel-measurable function. Define the sigma algebra generated by the observation, 
= a{Yg : < s <t)V Af, where Af are the Q-negligible sets. 

The main objecive of filtering theory is to obtain the best estimate of the signal process 
based on information from the observation. The best estimate, called the optimal filter (the 
Zakai equati on), is a conditional expectation that satisfies a recursive equation driven by the 
observation (IZakail 196&] ). For a formal definition, denote the optimal filter for the multiscale 
system by vr^, a finite measure on R'""'""' . The goal is to calculate the (normalized) observation- 
dependent filter 7rf{(f) =^ 'EQ[(p{Xf , Zf)\y^], where 93 is a bounded measurable function on 

In order to calculated the specified conditional expectation, it is easier to work on a new 
probability space P^, on which the observation is a Brownian motion independent of the 
signal noise (IV^,y). is related to the original probability space Q by the Girsanov transform 
Bain and CrisanI (2009t | 



(2) 



„^ def dF' 



exp h{Xl ZtfdBs \h{Xl, Zl)\''ds 



which effectively removes the drift h from the observation equation 

def r f trc ryc\ I T^c\—\ 



filter as p\{^) = Ep4^{Xf , Z^) {Dl 
related using the Girsanov transform: 



Define an unnormalized 
The normalized and unnormalized filters are 



7rti^) = EQ[^{x!,z!)\y[_ 



EM^{X!,ZI){DI)-^ 
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Th e unnormalized filter satisfies the Zakai equation (see, for example, iBain and Crisan 
20091 ] ): 



Here, = ^Cp + Cs is the differential operator associated to (X^, Z^), with 

" 9 1 " 92 



^ ' dzi 2 ^ ' dzidzj 

1=1 *ii=i 

™ a 1 ^ ^2 



dxi 2 ^ dxidxj 

where -"^ denotes the transpose of a matrix or a vector. 

Next, we briefly present a result from stochastic averaging theory for the multiscale setting 
of the problem considered. We assume that for every x G M'", the solution of (jlbp with 

= X fixed is ergodic and converges rapidly to its unique stationary distribution Poo{x : •)• 
this case, it is well known that X'^ converges in distribution to a diffusion governed by an 
SDE 

= h{Xf)dt + a{X^)dVt 

for appropriately averaged h and a. In other words, a stochastically averaged model provides 
a qualitatively useful approximation to the actual multiscale system. Hence, if we are only 
interested in estimating the slow process, or the "coarse-grained" dynamics, then we should 
make use of the homogenized diffusion X^ . Specifically, if we are only interested in the x- 
marginal, vr^'^, of the optimal vr^, then this X^ can be used to construct an averaged, or 
homogenized, filter vr^ that can appropriately replace tt"^'^. In high-dimensional problems, vr'^ 
would be preferable to vr'^'^ since calculation of vr'^ using X^ does not directly involve calculations 
of the fast process. 

By making use of X^ , we would like to find a homogenized (unnnormalized) filter p^ that 
satisfies 

dp1{^) = p1{C^)dt + pUhip)dY,', p^ip) = Eq[(/p(XO)], 

such that for small e, the x-marginal of /O^, p^'^, is close to p^. We let the generator £ of X^ be 
defined as 

o 



dxi 2 ^ dxidxj 

where = j b{x,z)p^{x^dz) anda = f{aa'^){x, z)poo{x,dz). Also, define = f h{x, z)poo{x,dz). 
Note that the homogenized filter is still driven by the real observation and not by a "ho- 
mogenized observation", which is practical for implementation of the homogenized filter in 
applications since such avearaged observation is usually not available. However, even if such 
homogenized observation is available, using it would lead to loss of information for estimating 
the signal compared to using the actual observation. 

Now define the measure-valued processes vr" and vr^'^ in terms of p^ and p^'^ as was in 
terms of p^: 

The main result of Imkeller et al.l (2OI2I ] is that under appropriate assumptions on the coefficients 
of (fTa|) . (jlbp and h, there exists a metric d on the space of probability measures such that for 
every T > there exists C > such that 

Eq [d(7r^''',7r^)] < ^/eC, i.e. limEQ [d{Tr'^'' , tt^)] = for any T > 0. 



In other words, the j;-marginal of the optimal filter vr^ converges to the averaged filter vr'' 
in the space of probability measures as the separation parameter e goes to 0. Hence, the 
homogenized filter is an appropriate measure to use in place of the actual vr"^'^ in estimating 
the "coarse-grained" dynamics in a setting with wide timescale separation. In terms of 
filtering applications, vr*^ presents the advantage of not requiring exact knowledge of the fast 
dynamics for the purpose of estimating the "coarse-grained" dynamics. Only knowledge of 
the invariant measure of is required, hence by applying appropriate multiscale averaging 
numerical schemes, computation and information storage for the fast dynamics can be reduced. 

The convergence of the normalized filters, vr^'^ — )• vr'', was shown by first obtaining the 
convergence of the unnorm alized filters, p"^ '^' to . The dual representations of p'^^^ip) and 
Pj^{ip) were introduced as in |Pardou?3 |l979l |: 

vf'^{x,z) '^'E^.^MX^){Dy)-'\yl^], and = Epo [^^(XO ) (^V)"' |3^f,T]- 



and P^j, are the respective measures under which {X'^,Z'^) and X^ are governed by the 
same dynamics as under and but {X^,Z^) and X^ stays in {x,z) and x until time t. 
DIj. and D% were defined as the Girsanov transform ([2]), but with moving limit of integration 



exp (- h{Xl, ZlfdB, \h{Xl, Zl)\^ds^ , 



DlT = e^v{-j^ hiX^fdY,' + ^l^ \h{xXdry 

and y^rp = a{Y^ — Yf:t<r<T)vMis the filtration generated by the observation over [t, T], 
minus observation history from to t. The Markov property of {X^, Z^,X^) results gives the 
relations between the unnormalized filters and the respective duals: 



(3) pj^ '^'^^^ j ^^^'^)'^(^6.^o)('^^'^^) ^'^'^ Pt{'P) = J Vq ''^{x)Qxoidx). 

Note that Qxo = Qxgi because the homogenized process has the same starting distribution as 
the un-homogenized one. 

For fixed T and (f G C^(M"^,M), we will write = vl''^'^ and = w^'"^'"^. The dual process 
vl essentially represents the conditional expectation by an alternate conditional expectation 
that is run backwards in time from T to 0. For fixed starting point (x, z) sXt = 0, this backward 
in time conditional expectation is constructed using processes {X^,Zj,) that started at {x,z) 
and ran backwards to t = 0. By integrating Vq over Q(xg,zg)) we are integrating over all possible 
starting points {x,z), hence giving p^.. This interpretation is similar for of p^ . 

From ([3]), we have 

(4) npr'{v)-PTm'] < I nKix,z)-v'o{x)m^xE,z^){dx,dz). 

So, if \vq{x,z) — Vq{x)\ is small, then \pf^^{ip) — Pj'{ip)\ will also be small as long as Q{xg,zg) is 
well behaved and (jU will lead to the normalized filter convergence result. Therefore, the aim 
is to show that for nice test functions ip, E,[\vq{x, z) — Vq{x)\p] is small. 

The key point is that and solve backward SPDEs. We formally expand as 

vf{x,z) = Ut{x,z) + eul^^ {x,z)+e^u^/^{x,z) . 

^°{t,x) ■4>(t,x,z) R{t,x,z) 

Then, v^, ip and R satisfy linear partial differential equations with appropriate terminal con- 
ditions. By existence and uniqueness of the solutions to these linear equations, we can apply 
superposition to obtain that indeed 

vf{x, z) = Vi{x) + iptix, z) + Rt{x, z), 
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where ip and R are the corrector and remainder terms, respectively. Based on the expansion, 
the problem of showing L^-convergence of f ^ to reduces to showing L^'-convergence of ip and 
R to 0. 



Details of the proof are provided in llmkeller et alj |2012[ |. The outline of the method of proof 



is as follows: The backwards SPDEs were converted to their respective proabilistic represen- 
tations, which are backward doubly stochastic differential equations (BDSDEs). The diffusion 
operators were replaced by the associated diffusions and explicit estimates for the finite di- 
mensional B DSDEs in terms of the transition density function of the fast process were able to 
be obtained. IPardoux and Veretennikov 2003l | proved very precise estimates for this transition 
function, and were used to obtain the desired bounds on ip and R. 



3. The Lorenz'96 System 

The systematic strategy for the identification of slowly evolving climate modes and fast evolv- 
ing non-climate modes requires a lengthy analysis. In this paper, we consider a simple atmo- 
spheric model, which nonetheless exhibits many of the difficulties arising in realistic models, to 
gain insig ht into predict ability and data assimilation. The Lorenz'96 model was originally intro- 
duced (in iLorenzl \l99^ ) to mimic multiscale mid-latitude weather, considering an unspecified 



(5) X^ = -Xt\xt'-X^+')-X^ + F, + ^^Z^'\ k = l,...,K, 



scalar meteorological quantity at K equidistant grid points along a latitude circle: 

J 

(6) z^-' = \ { -z';-'*'!^^-'^ - z";-'-') - z^-' + KXi], j = 1, . . . , J. 

Equation ([5]) describes the dynamics of some atmospheric quantity X, and X^ can represent the 
value of this variable at time t, in the k^^ sector defined over a latitude circle in the mid-latitude 
region {Note: We use superscripts k and j to conform with the typical spatial indexing notation 
used for the Lorenz '96 model. In sections that follow, subscripts k and j will be used as discrete 
time indices, not to be confused with the spatial indices of the Lorenz model). The latitude 
circle is divided into K sectors (with values of K ranging from K = A to K = 36). Each X^ is 
coupled to its neighbors X^^^, X^~^, and X^'"^ hy ^ applies for all values of k by letting 
j^k+K ^ ^k-K ^ ^k^ example for k = I, = , X^'^ = Xf, and X^'^ = Xf. 

The system was extended to study the influence of multiple spatio-temporal scales on the 
predictability of atmospheric flow by the dividing each segment k into J subsectors (J = 4 to 
J = 10), and introducing a fast variable, '■' given by ([U]), associated with each subsector. 
Thus, each X^ represents a slowly- varying, large amplitude atmospheric quantity, with J fast- 
varying, low amplitude, similarly coupled quantities, Z^'\ associated with it. In the context of 
climate modeling, the slow component is also known as the resolved climate modes while the 
rapidly-varying component is known as the unresolved non-climate modes. The coupling terms 
between neighbors model advection between sectors and subsectors, while the coupling between 
each sector and its subsectors model damping within the model. The model is subjected to 
linear external forcing. Fx, on the slow timescale. 

The dynamics of the unresolved modes can be modified to include nonli near self-interaction 



effect s by adding forcing in the form of stochastic terms (see, for example, iMaida et al.l [2001 



20031]). The use of stochastic terms to represent nonlinear self- interaction effects at short 



timescales in the unresolved modes is appropriate if we are only interested in the c oarse-grained 



dynam ics occuring in the long timescale. This is called stochastic consistency in iMaida et al 



2003l |. Considering ([SD, where only quadratic nonlinearity is present, the motivation behind 



adding stochastic forcing is thus to model higher order self-interaction effects. This can be done 
using a mean-zero Ornstein-Uhlenback process, specifically a process with amplitude -h=, that 



IS, 



(7) zf'^- = \ - Z',^^'') - Z\^^ + KX^] + i=c(t). 

This is done more expli citly in a gene ral context in Section 01 The two-scale Lor enz'96 model was 
also u sed extensively by Wilks 2005 ] to study stocha stic parametrization and by Lorenz and Emanuell 



1998| | for analyzing targeted observations, and by Herrera et al. 2011 ] and several others for 



analyzing the influence of large-scale spatial patterns on the growth of small perturbations. 
4. Homogenized Hybrid Particle Filter (HHPF) 

Numerical simulation of multiscale dynamical systems is quite problematic because of the 
wide separation in the time-scales involved. Based on the results presented in Section [21 we 
combine our study of stochastic dimensional reduction and nonlinear filtering to provide a rig- 
orous framework for developing an algorithm for lower dimensional particle filters specifically 
adapted to the complexities of underlying multiscale signals. The proposed particle filter al- 
gorithm is adapted to a system with time-scales separation by incorporating a homogenization 
scheme in the nonlinear filter, in conjunction with the stochastic dimensional reduction results 
of Section [2j 

In order to apply stochastic dimensional reduction in the problem presented in Section [3l we 
introduce a stochastic forcing term in Q to represent external forcing effects on the system 
due to unresolved dynamics. To illustrate the homogenization scheme, we consider a general 
system of stochastic differential equations (SDKs) that corresponds with the problem of Section 
[31 with stochastic forcing in the fast component: 

(8) Xf = 6(Xf,Zn, Xl^M{^l,al) 

Zl = e-'fiXl Zt) + e-^'^g{Xt, ZfWu Zl ~ AA(/zg, ^q^). 

In the following two subsections we explain the Heterogeneous Multiscale Method and Se- 
quential Importance Sampling. These techniques are then combined in the Homogenized Hybrid 
Particle Filter. 

4.1. Multiscale numerical integration. For numerical simulations of ([8]), the timestep bt 
required for the forward integration of the fast process Z^ needs to be smaller than e for stability 
of the integration scheme. However, with such timestep, significant changes in the slow variable 
can only be seen on the time-scale of 0(1), i.e., much of the computational resources is wasted 
in solving for an almost stationary evolution of the slow process X^ . 

Based on the results presented in Section [2l we can solve this problem by adopting a reduced 
system through stochastic averaging, that is, 

(9) kt = h{Xt), 
where 

(10) h{x) = j bix,z)fi^{dz). 

By adopting Q with (flU]) . we are under the assumption that dSj) is exponentially mixing, and 
the Doeblin condition is satisfied. Stated informally: With the slow process fixed at X^ = x, 
the corresponding fast process Z^ has a unique invariant measure fixi^z), i.e. the transition 
probability measure P^{z,t) of Zf converges exponentially to fixidz) in the weak sense as 
t — )• oo, locally uniformly in x and z. This implies that for any test function (p G Cf,(M™) 

E,[ip{Zf)]^ j ip{z)nxidz) as t ^ oo, 

uniformly in x and z in any compact set. 

However, the evaluation of the high dimensional integration in (jlOp is nontrivial, and usually 
it is impossible to obtain an invariant distribution of the fast variable analytically. To determine 
the invariant distribution numerically, we adopt the Heterogeneous Multiscale Method (HMM) 
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introduced in IVanden-EiindenI 2003l |. The HMM is a method of determining the effective dy- 
namics Q through numerical approximation of the invariant distribution of the fast process. 
It is based on the observation that the fast variable reaches its invariant distribution (equi- 
librium) on a time-scale much smaller than the time-scale needed to evolve the slow variable 
(the Doeblin condition). This implies that we could use a much larger timestep value At 
for the evolution of the slow process while keeping the timestep for the fast process small for 
stability. 

Following the procedure presented in Vanden-Eii nden 20031 ] . we describe the HMM. The 
evolution of the averaged equation ([9]) is approximated numerically using a forward inte- 
gration scheme. For simplicity, we will use the Euler and Euler-Maruyama schemes in the 
following description , alth ough higher order schemes are also applicable (see, for example, 
Kloeden and PlatenI (l992 |). Consider the interval [0,T] discretized into timesteps of size 



At = |_-^J • Let tk = A; At and write Xt^. as X^. 



(11) 



X. 



k+l 



Xk + b{Xk)At 



is the macro-solver with At being the macro-timestep (Note: As mentioned in Sectional sub- 
cript k here indicates time index, different from the spatial index superscript k in the Lorenz 
'96 model). The value of b, which is an approximation of the averaged coefficient b in (jlOp . can 
be calculated as 



(12) 



bi^k 



1 



MN„ 



M TlT + Nm 

E 

1 j=nT 



E 



where M is the number of replicas of the fast process Z for spatial averaging, Nm is the number 
of micro-timesteps 5t for time averaging, and rix is the number of micro-timesteps skipped to 
eliminate transient effects. The evolution of Zt'^- is governed by the following micro-solver 



(13) 



+ -f{Xk, 



Z'/.)5t + ^g{Xk, 



Note that a cornbinati on of spatial and temporal averaging is used in ()12p but it is shown in 
Vanden-Eii nd^ [2003^ that the combinations of M, Nm, and 6t can be chosen based on error 
analysis such that no spatial averaging or no spatial and temporal av eraging is require d. For 



more detailed explanation and erro r analysis, refer to the references I Vanden-Eii ndeni 20031 1 
Fatkullin and Vanden-Eii ndeni 2004]. 



4.2. Homogenized Hybri d Particle Filte r (HHPF). The algorithm for the continuous- 



time HHPF is presented in Park et al. 2O10t ]. As in standard particle filtering methods, the 



continuous-time equations can be discretized and the filtering can be done on the resulting 
discrete-time models. Here, we present the discrete-time version of the HHPF using the sequen- 
tial importance sa mpling (SIS) algorithm, w h ich is also coinmonly known as bootstrap filtering 



tiai importance sa mpnng ( sis ) aigoritnm, w n icn is also coinmonly h 
(see, for example, Arulampalam et al. 2002| ]. Gordon et al. 19931 ]). 



We will first provide a brief overview of the idea behind importance sampling and then 
illustrate how it is applied sequentially in particle filters. Then we present how the HHPF uses 
the SIS algorithm. 



4.2.1. Importance Sampling. In particle filtering, there is always the need to represent some 
distribution using a collection of particles. When it is difficult to sample from a given distribu- 
tion, the idea is to sample from another distribution that is more tractable to sample from, and 
properly use that sample to represent the distribution of interest. 

Importance sampling is a technique for approximating integrals with respect to one probabil- 
ity distribution using a collection of samples from another. Let p be the target distribution of 
interest over space X and q ^ p (qis absolutely continuous with respect to p) be the distribution 
from which sampling is done {q is also called the proposal distribution). Denote by Ep[.] and 
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Eg[.] the expectation with respect to the distributions p and q, respectively. For any integrable 
function : X — )• M, we have 

(14) ^pHX)]= [ ^{x)p{dx)= [ ^{x)^{x)q{dx) 

ip{x)w{x)q{dx) = Eg [w{X)ip{X)] , 



where w '= 

A collection {x*}^^ of particles can be sampled from q and the particles can be weighted 
according to w"^ oc ^(a^*) to represent the target distribution p i.e. 

p{x) ^tt;*(^(x - X*). 

i=l 

The weights w"^ are normalized such that X]""^* — 1- 

The strong law of large numbers can be employed to verify that the empirical average of ip 
with respect to the weighted sample from q converges as A'^s — )• oo, with probability 1, to the 
expected value of tp under the target distribution p, i.e. 



' Ns 



ip{x)dx 



Ns 

^w'ifix' 



4.2.2. Sequential Importance Sampling (SIS). The SIS algorithm is a technique of using Monte- 
Carlo simulations for Bayesian filtering through the incorporation of importance sampling. Con- 
sider a discrete-time signal Xt,, and a discrete-time observation 1^^. and let p (xo:fc| yo-k) be the 
density of the target posterior distribution at timestep tk- The SIS algorithm approximates the 
target distribution using appropriately weighted samples from a proposal density q{xo-k\ Vo-.k)- 
Suppose we represent p {xQ-k\ yo-k) using a collection {xq.^j,}^^^ of Ng particles sampled ac- 
as 



where w], oc 



(Xoifc 


yo:k) 


p(<fe 


yo:k) 


9(4:* 


yo:k) 



Ns 

1=1 



1. We choose to arrive at proposal densities sequentially 



with wl 

g(xO:fc| yO:k) = q{Xk\ Xk-l,yk) q (X0:fc-l| yo;fc-i) • 



Making use of the identity (see, for example, Arulampalam et al. 2002l |. equation (45)) 

. I N P(yfc|Xfc)p(Xfc|Xfe„i)p(xo:fe-l|yO:fc-l) 
P{XO:k\ yO:k) = 7 \ ^ , 

we write (after removing p{yk \ yo-.k-i) because it is same for all the particles) 



(15) 



Wl, oc 



oc 



p{yk\xi] 


P{ 




p 


[^0:k~l \ yO:k- 


l) 


q{xi 


4 


-i,yk) 


q 


[4:k--l\yO:k- 


l) 



p(yfcl4)p(4l4-i) 



,414-1'yfej 



w 



fc-1- 



Hence we have a sequential formula for computing the unnormalized particle weights. Because 
our choice of q is done sequentially, the proposal density depends on the location of the particle 
only at the previous time step. Only x^_^ need be stored and the rest of the path Xq.^ can be 
discarded. We then have 



P{Xk\yO:k) ^^Wl6{x - Xl) 



i=l 



with weights updated according to (fTSj) and normalized by X^j^fc — ^■ 
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So, the SIS algorithm is this: At step k — 1 the location of particles {x\_^}f^^ is known. 
New observation is recorded. Choose a form for the proposal q{•\x\_-^^,yli.). Sample a par- 
ticle according to this proposal density and get the new location x^.. Evaluate the number 
q{xl.\xl._^,yk), and then evaluate the quantities in the numerator of equation using the 

sensor dynamics and the signal dynamics. Then update the weights according to equation (|15p . 

In filtering with a fixed number of particles, it is crucial to keep the variance of the weights to 
minimum: if a collection of particles is such that the weight is concentrated in a small number 
of particles, this collection represents the distribution poorly. One can better it by selecting 
a collection which has many particles near the region of high concentration and the particles 
sharing nearly equal weights. As can be seen from the weight update equation ()15p . the particle 
weights depend crucially on the choice of the proposal distribution q. So, q should be choosen 
with an aim of mi nimizing the variance of p article weights. 

It can be shown Arulampalam et al. 2002l | that the proposal density which keeps the variance 



of the weights to a minimum is 

/Ift^ opt/ I ^ P{yk\xk)p{xk\xk-l) 

(16) q^[Xk\xk-i,yk)- 



f p {yk\ xk) p {xk\ Xk-i) dxk' 

Employing this in weight update equation [TJ] we have that, if we choose the optimal proposal 
density, 

(17) wlcKwl_^ I p{yk\xk)p{xk\x\_]) dxk- 



In general case, it is difficult to sample from q°^^ . However, it is e asy when both the likeliho od 
P {yk\ Xk) and the conditional prior p {xk\ Xk-i) are Gaussians, cf. lArulampalam et al.l j2002l |. 
Consider the system dynamics and observation equation: 

(18) Xk+i = F{Xk) + axWk+i 

(19) Yk+i = HXk+i + (JyVk+i 

where Wk and Vk are independent centered Gaussian increments with variance one and Q '= 

axf^x, R "= (Tycty are strictly positive definite. Since this subsection on SIS and the next one 
on optimal control discuss a general method, we have used F to denote the vector field instead 
of b and / as in the previous sections. We then have that the likelihood p{yk\xk) = M{Hxk, R) 
and the conditional prior p {xk\ Xk-i) = M{F{Xk^i), Q). Using (fT6|) . we have 

q°^^{xk\xk-i,yk) =M{F{xk-i) +a{xk-i,yk),Q), 
Q= {Q-^ +H^R-^Hy\ 

(20) a(xfc_i, 2/fc) = QH^R-\yk - HF{xk-i)). 

This is a Gaussian. Once we have particle locations {x\_i}^^ representing the posterior at 
time k — 1, and the observation yk is recorded, we can evaluate a{xk-i,yk)- We can then sample 
a particle from the above Gaussian A/'(F(2;^_-^) + a{x^j^_^,yk),Q) and arrive at a collection 
of particles {x^}. Alternatively, the particles can be evolved according to 

(21) Xk+i = F{Xk) + a{Xk, yk+i) + axWk+i 

where ax is such that axO''x ~ Q- Then Xj, behaves like a particle sampled from ^°p*(-|x^_^, y^). 
The weights are updated according to (fT7|l : we will have 

wl cc wl_,eKp!^-^{yk-HF{xl_,)fR-Hyk-HF{xl_,))Y 

(22) R-^ = R-^ (l - HQH'^R-^^ . 
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4.2.3. Stochastic optimal control approach. We consider the same discrete-time nonlinear system 
with linear observation (|18p . (|19p . The particle method presented in this section consists of 
control terms in the "prognostic" equations, that nudge the particles toward the observations. 
We nudge the particles by applying a control Uk{y, x) according to 



(23) 



k+l 



F{Xk) + Ukiyk+i,Xk) + axW, 



k+l- 



The technique for determining the nudging term Uk{yk^ 
control by minimizing a quadratic cost: 



, x) is by method of stochastic optimal 



(24) 



J := 



k,x 



1 

2 L 



ul{y,x)Q ^Uk{y,x) + {y 



Hxl';f)) 



where Q, R are the signal and observation noise covariance matrices and by X^^'^^ we mean 
that the process X started at time k at the value x. Again, we assume the linear sensor function 
with observation available at every timestep, given by (|19|) . The purpose of the present section 
is to show that control methods can be used as a basic and flexible tool for the construction 
of the proposal density inherent in particle filtering. In this framework, u can be interpreted 
as the optimal control for minimizing the cost J. The first term in (j24p represents the control 
energy and if we allow u to become too big, then heuristically all the particles will coincide 
with the observation. Then the particles will be a sample from a Dirac distribution, whereas 
the conditional distribution that we try to simulate is absolutely continuous. The second term 
in (|24p represents the distance between HX^j^i and the observation that we want minimized. 
Covariance matrices Q and R in the quadratic terms indicate that dimensions of the signal 
and observation that have larger noise variance are penalized less by the control. This means 
that in directions where noise amplitude is large, we allow for more correction by taking Q~^, 
which puts less penalty on the size of the control in the dimensions with large noise amplitude. 
Similarly, the terminal cost given by the second term in ()24p incurs a penalty for being far 
away from the actual signal based on observation, but in directions where the quality of the 
observation is not very good, we allow our particle to be further away from the observation, 
hence R~^. 

Define the value function 

1 



V°P\k,x) 



inf Efc^j.— 

Uk{y,x) ' 2 



ul{y,x)Q-\k{y,x) + {y - h{xlll>)f R-\y - /^(xf/f )) 



Using (flSP and (fl9p . and substituting for ^^^i second expression in the value function, 

we have 

y - 



k)Y R~^ (y-H_ 
+ {HaxWk+if R-^ {HaxWk+i 



y - HF{x, Uk)) ^ R~Uy- HF{x, Uk))-2(y- HF{x, Uk)) ^ R'^ (HaxW^+i) 



where F{x,Uk) = F{x) + Uk{y,x). Because Uk depends only on y and x, both of which are 
given, we have E^^^: [ukQ~^Uk\ = UkQ'^^Uk, and similarly 



y-HF{x,Uk)] R-' (y- HF{x,Uk) 



y-HF{x,Uk)) R-'(y-HF{x,Uk) 



Wk+i is a standard Gaussian random variable independent of X^ = x, so 



E 



k,x 



y-HF{x,Uk)] R-\HaxWk 



+1) 



y - HF{x, Uk) R-'HaxE [Wk+i] = 0, 



E 



{HaxWk+if R-^ {HaxWk+i)] = tr (^{Haxf R'^ [Hax] 
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Therefore, 



V{k, x) = ^ inf i^ulQ'^Uk +(y- HF{x, Uk)f R'^ (y - HF{x, uu)) + tr ({Haxf R'' [Hax] 



and 



d 

V{k, x) = Q'^Uk + H^R~^H{F{x) + Uk) - H^R'^y 



duk 



{Q'^ + H^R-^H)uk - H^R-^ (y - HF{x)) , 



hence the optimal control is 



uf = (Q-i + H^R-'H)-'H^R-\y - HF{x)), 

which is similar to the solution from the previous approach, with the difference being the noise 
term, ax in (f2T]l and ax in (f23]l . 

This section provided the results related to the control design of the particles (prior to up- 
dating the weights) that is needed to nudge the particle solutions toward the observations. This 
procedure consists of adding, forcing terms to the "prognostic" equations for the construction 
of the proposal density inherent in particle filtering. However, it is possible that the nudging 
terms may become too large and destroy the balance between the effects of the noise and the 
control terms. The stochastic optimal control approach presented in this section is similar to 
the deriv ation of the 4 D-VAR method that is used in geophysical data assimilation (see, for 
example, Kalnav [20031 ]). The 4D-VAR method considers the problem of determining the best 



initial condition at time to for the forward integration of the model PDEs based on discrete 
observations collected, up to a finite time tx, in the future of to. In the 4D-VAR method, the 
cost function to be minimized with respect to the initial condition x{to) is 

J{x{to)) = ^[xito) - x'{to)fB,'[x{to) - x\to)] 

+ ^[Hix{tK)) - y{tK)f R-\H{x{tK)) - y{tK)], 

where x^{to) was predicted using the model equations from time before to and x{tK) is obtained 
by integration of the model PDEs using x(to) as initial condition. From this point of view, the 
stochastic optimal control approach presented here can be viewed as determining the optimal 
initial condition at every discrete time tk using the next available observation at tfc+i- The 
optimal control u^^ is the correction made to the state Xk predicted from t^-i- 

The SIS presented in Section 14.2.21 allows the modification of the drift terms as well as the 
stochastic coefficients in the "prognostic" equations and SIS is an easy approach to implement 
numerically, therefore we will use SIS in conjunction with HHPF throughout this paper. How- 
ever, when dealing with sparse data, a proposal densi ty based on s t ochas tic control theory 
presented in this section is essential, as can be shown in iLingala et aD [2012l |. 



4.2.4. HHPF. In the following we describe how the HHPF uses the SIS algorithm. For now, we 
would choose the proposal density g ( x\_^,yk) = p{-\x\_i) i.e. the conditional prior which can 
be obtained from the signal dynamics. A particle can be sampled from this q by propagating 
the location of the particle at A: — 1 using the signal dynamics. We note that by choosing such 
a (7 we are being blind to the observation y^. We address a better choice in Section [4.2.51 

The HHPF is developed based on the results presented in Section [2j It incorporates the 
HMM described in Section 14.11 in the particle evolution step of a particle filtering algorithm. 
Although the problem presented in Section [3] has continuous-time signal, the HHPF is applied 
on the associated discretized model. 

Consider the time interval [0,r], discretized into macro-timesteps of size At = [-^J and 
denote the micro-timestep hy 5t « At, where 5t is chosen small enough compared to e for 
numerical stability. In the context of the HMM, the intervals [kAt, {k + l)At], k = 1, . . . , N — 1, 
are the macro-timestep intervals over which evolution of the slow process occurs. The fast 
process is evolved over micro-timestep intervals [j5t, [j + l)5t], j = 1, . . . , riy + Nm within each 
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macro-timestep interval. The discretized version of the equation Q governing the coarse-grained 
dynamics Xt, is used for propagation of particles {xl.}.J^^ over macro-timesteps: 

(25) Xfc = Xk-i + b (Xk-i) At + a.AWk, 

where b is defined in ()12p . Note that (|25p is the same as (jlip but with additive noise, which is 
introduced to regularize the system, for the purpose of weight calculation. 

Instead of continuous observations, we assume that observations are available at discrete 
macro-timesteps. Write =^ Yf^. The discrete-time observations are 

(26) Y^ = h[xlZl^^^)+ayVk. 

Observations Y^ are used to update the sample by altering particle weights and resampling 
the system of particles. For notational consistency, we denote the particles representing the 
averaged X^ by x\. 

The evolution of the particle system via the HHPF is given by the following steps with a 
graphical illustration of each step shown in Figured! 




Figure 1. HHPF illustrative scheme 

(a) initial condition: Samples of Ns particles for the slow process, Xk, and Ng x M 
particles for the fast process, j (subscripts k and j are the macro-step and micro-step 
indices, respectively, not to be confused with the spatial indices, superscripts k and j, 
in Section [3|), are drawn from the initial distribution of the state variables. The initial 
distribution of the slow process, ttq, is approximated by a sample of Ng particles {xgljJi 
drawn from ttq, each of mass jy-, i.e. 

1 

^ 1=1 

Similarly, the initial distribution of the fast process is approximated by a collection 
of M particles < Zq^'^' > which are equally weighted (-Zq'*'^ G is the position 
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. Ns,M 

£,i,r 



of the r*^ particle). Note that we are only interested in the coarse-grained dynamics 
Xt of the system. The fast process, assumed to be exponentially mixing, is spatially 
and temporally averaged through the implementation of the HMM. Thus, under the 
assumption of ergodicity, we can set M « Ng or even M = 1 without loss of accuracy 
in approximating the distribution of X^- 
(b) prediction and update: The prediction and update step is where HHPF differs from 
regular particle filters through the incorporation of dimensional reduction and homoge- 
nization techniques. 

In a regular particle filter, particles evolve independently according to the discretized 
signal equations which govern the behaviour over a micro-step 5t. Recall that 5t is 
small compared to e. So, the regular filters simulate the slow process also every 5t^ even 
though it does not change much in such a small time, thus wasting the computation 
resources. 

The HHPF utilizes multiscale scheme of the HMM to propagate the sample forward 
in time. Particles {x\^^^^ are propagated independently over macro-timesteps At us- 
ing the macro-solver ()lip . where the averaged drift is approximated using ()12p . Particle 
propagations in a macro-timestep interval are illustrated in the "predict" segment of Fig- 
ure [TJ Within each macro-timestep interval, the particles representing the fast process, 

k fixed > , evolve according to the micro-solver (|13p over micro-timestep in- 

J i,r=l 

tervals \j5t,{j + l)6t], j = l,...,Nm — 1, while the slow process, Xf^ = x is fixed 
throughout the macro-timestep interval. Note that in the HMM scheme, < Nm 
only needs to be sufficiently large for the fast process to attain an invariant measure 
within [kAt, {k + l)At]. Also, the particles representing z^^^ that are propagated over 
the micro-timesteps are in total Ng x M, but M can be set to be 1 by appropriately 
adjusting the value of Nm, thus the number of fast particles that need to be propagated 
can be Ng, even with implementation of the HMM scheme. The coefficient b in (|12p 

(sir 1 ^"'^^ 

is computed by averaging over the particle locations < z^^Y : j = 1 ) • • • ) "-T + \ 

Averaging over •z^j-'^s is performed spatially over the sample of M particles and tempo- 
rally over the micro-timesteps interval [nyJi, [ut + Nm)5t\. The prediction step is given 
by (125]). 

The particles are updated at time-steps when observations are available, as illus- 
trated in the update segment of Figure [TJ Only the sample of fast process {x\^^^^ is 
updated, since, in the multiscale scheme, we are interested only in the dynamics of the 
homogenized X^. The SIS algorithm is used for updating the sample. In propagating 
the particles {x^}, as described above, according to the homogenized signal dynamics 
over At, we arrive at new location of the particles {x^^-j^}. These new locations can be 
thought of as a sample drawn from the proposal density q{-\xk) which is same as the 
prior p{-\xk) of the homogenized dynamics. Particle weights are calculated sequentially 
according to and because of our assumption of the proposal, we have 



w 



fc = P (2/114) 



We would like to point out another difference of the HHPF compared to a regular SIS 
filter, that arises due to homogenization. 

Considering independent standard Gaussian noise increments for the sensor noise, i.e. 
Vk ~ A^(0,I), in (j26p . the likelihood function is Gaussian: 

(27) p(y||xfc) ocexp|-^(y|-/i(xfc))'^(cr3^o-J) ^ (y| - /i (x^)) | . 
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Instead of the sensor function h 



h{x) 



the averaged sensor function 



is used, since we are deahng with the coarse-grained dynamics. As with the homogenized 
drift, the averaged sensor function is approximated by h via the HMM. h has the form 



hiXk 



1 



MNr, 



M Ut+N, 

E 



Zk,j,r) 



r=l 



It is worthwhile to note that the actual available observation is used instead of a 
fictitious averaged in calculating the weights. Thus the homogenized system ([9]) is 
combined with the actual observation Y^, from which the name "homogenized hybrid" 
was derived. 

After a few time steps, all the weights may tend to concentrate on a very few parti- 
cles, which drastically reduces the effective sample size. This issue is addressed in the 
following step. 

resampling: The nature of the sequential importance sampling algorithm is such that 
the variance of th e unnormalized weights increases wit h each iteration (se e, for example, 
Doucetl 19981 ] ■ Theorem, p. 285 in iKong et al.l 



m 



1994\ ). Due to the 



Prop. 3, p. 7 

gradual increase in weights variance over time, weights will tend to concentrate on a few 
particles, causing the issue of sample degeneracy. Sample degeneracy decreases the abil- 
ity of the weighted sample to properly represent the target posterior distribution since 
only a limited number of particles will have significant weights. Additionally, it incurs 
the cost of propagating and storing particles with insignificant weights, which effectively 
do not contribute to representing the target distribution. One method of addressing this 
issue is to perform occasional resampling when the sample degeneracy level reaches a 
certain threshold. Resampling does not overcome the issue of weights degeneracy; it 
only serves to rejuvenate the sample by eliminating particles with insignificant weights 
and multiplying those with weights that significantly contribute to approximating the 
posterior. 

The measure of sar aple degeneracy can be de termined by the effective sample size 
A^*^^ (see, for example, Arulampalam et al. 20021 ]): 



N, 



eS dof 



1 -I- Var {w*j, 



dcf 



pix". 



yo:k) 



-vVk) 



where w*^ is the "true" weight. The exact values of w*^ cannot be evaluated since 
Pi^klVo-k)^ the actual posterior, is not known, so the effective samp l e size is approximated 



numerically using the normalized weights by Arulampalam et al. 2002| ] 



eff 



1 



M 



< Ns represents the effective sample size at timestep k, and a resampling procedure 



is carried out whenever iV|^ falls below a set threshold. 

One resampling procedure is the syst ematic resampling procedure, as described in 
Arulampalam et al.l 2002], Doucet 19981 ]. The resampling procedure involves sampling 
with replacement on the current sample, multiplying particles with significant weights 
and discarding those with insignificant weights. The new sample is reinitialized with 
uniform weights for each particle. 

In addition to resampling, the issue of sample degeneracy can also be addressed by 
choosing a good importance sampling density q. As described so far, the importance 
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density q[xk\ x^^i) is based on the propagation of the sample from the previous esti- 
mation step. i.e. the observation was not involved in the importance density at time 
k. In Section 14.2.51 below, we propose a method of specifying an importance density 
Q{xk \ Xk-i,yk) at time k that makes use of the observation at time k . 
Algorithm [1] shows a pseudo code for the HHPF. Overall, the advantages of Algorithm [1] are: 

(a) The number of fast samples evaluations is greatly reduced (if the number of fast sample 
replicas is set as M = 1 by choosing Nm appropriately) , since Nm < [^^t / 6t\ . 

(b) The total number of timesteps is decreased due to the relatively large macro-timesteps. 

(c) The number of function evaluations are also decreased accordingly. 

Even though there are additional function evaluations (as in equation (jl2p ) in incorporating 
the HMM, these are negligible compared to the original weight calculations of the regular 
branching particle method, which have been reduced in the HHPF through (|29p . 

Although the HHPF has been adapted for multiscale computations, another issue arises in its 
application on complex problems with inherent chaotic nature such as that described in Section 
[31 As shown in Section [5l we are able to perform state estimation for a chaotic system using the 
HHPF as described so far, the procedure can be made more efficient by importance sampling. 
However, as mentioned earlier, a key aspect of the importance sampling algorithm is the choice 
of the sampling density q. In the following section, we present a method of constructing a good 
sampling density by introducing a forcing term in the particle evolution of ()25p . 



Draw samples from initial distribution: 
for k = l:number of macro-timesteps (K) do 
for r = l:number of replicas (M) do 

for j = 1: number of micro-timesteps (Nfn) do 

Solve micro-solver (fT3l) : -^fc^+i 
end for 
end for 

Perform averaging (fT2|) and ([29]) : {/(x^), /i(x^)}^''^ 
Solve macro-solver ([TT]) : 
Compute weights: 

Compute effective sample size: N^g^ 

Resample using choice of resampling algoritlini if -^^^^ ^ -^thres 

Reinitiahze: 4+li=i = ^'.tC 
end for 



Algorithm 1: HHPF ([Park et al.l |2011l 



4.2.5. HHPF using the optimal proposal. In the event that the averaged sensor function ([28p is 
expressible as h(x) = Hx + C^, where is a vector which can change with time k, then one 
can use the optimal proposal which keeps the variance of weights to minimum, as explained in 
14.2.21 Note that h is of this form if the observation function h is linear and only depends on 
the slow components. Only the following changes need to be made to the algorithm: 

• Instead of propagating the particles xl.^^ by using ([25p. we propagate according to 
(30) Xk = Xk^i + b (Xk^i) At + a{Xk, yu - Ck) + a^AWk, 

where dx is such that Q =^ (cj^cjJ) = {{oxO''^)^^ + {ayay)~^H)^^ , and a{Xk,yk) = 

QH^R-Hyk - Hf{Xk-i)), where /(x) = {x + b (x) At). 

• The weights are updated according to 

wi oc wi_^exp!^-^{yk-Ck-Hf{xi_^)fR-\yk-Ck-Hf{xi_^))Y 
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(31) R'^ = R-^ (l - HQH^R-^^ 



5. Application 

Based on the results of homogenization and optimal importance sampling, we have developed 
a new lower-dimensional particle filter, the HHPF, for state estimation in nonlinear multi-scale 
systems. In this section, we illustrate the HHPF's potential for state estimation in a high 



dimen sional complex problem by applying the HHPF algorithm to the Lorenz '96 (jLorenz 



I995I ]) atmospheric model (see Section [3]) to estimate the slow variables. The HHPF algorithm 



is implemented in discrete time, using SIS, as presented in Section [4.2[ The model parameters 
for application of the HHPF are as follows: -ftT = 36 J = 10, = 10, hx = —0.8, = 1, 
e = 1/128. 

The model was first simulated for 40960 timesteps of size 2~^^ starting from arbitrary initial 
conditions to represent the "true" signal X'^ that was to be estimated. The timestep value 
of 2~^^ was picked such that it is small enough compared to the separation parameter e to 
ensure numerical stability. The Lorenz '96 model given by ([5]), d?]) was integrated using a split 
timestepping scheme: fourth order Runge-Kutta scheme for the deterministic drift and the 
Euler-Maruyama scheme for the stochastic parts. These schemes were selected for simplicity of 
implementation in the numerical experiments, but of course, coarser timesteps may be used by 
implementing higher order integration schemes. Observations generated from the true states 
were of the form assumed in Section I4.2.5t = HXf + Bt, i.e. depends linearly on X^, 
perturbed by a standard Gaussian noise. Observations were generated every 128 timesteps, i.e. 
at every timestep of size 2~^. This is the size of the macrotimestep At that we select for the 
HMM integration scheme. This means that we are considering the case where observations are 
available sequentially (at every tir aestep) at the timescale that we had chosen for the numerical 
integration of the slow process. In Lorenj (l995 |. the timestep chosen for numerical integration 



of the mul tiscale system with e = 0.01 was 0.05 , which corresponds to 36 minutes in real time. 

in sel ecting At = 2~^ and 6t = 2~^^ for the 



We follow iFatkullin and Vanden-EiindenI 12004 



HMM scheme for e = 2 Based on Lorenz 19951 ] 's scale, these timesteps approximately 



correspond to 45 minutes and 35 seconds, respectively, in real time and the duration of the 
model simulation and data assimilation experiments correspond to 10 days (14400 minutes). 
On the real time scale, the assumption of sequentially available observation data corresponds 
to observation data collection every 45 minutes, which is not an unrealistic assumption. 

We consider two cases for the linear observation matrix H. The first is H = IkxK, where / 
is the identity matrix, i.e. all slow states are observed (non-sparse observations). The second is 
Hij = 1 if i = J and i odd, and otherwise Hij = , i.e. only the odd-indexed slow dimensions 
are observed (sparse observations). Observation noise covariance matrix is assumed to be the 
identity matrix, i.e. observation noise in each dimension is independent of the rest. We chose 
the slow and fast signal noise covariance ruatrice s to be 1 on the diagonals and 0.5 on the 
sub- and super-diagonals as in Ivan Leeuwen (2010l |. with the fast noise scaled by e Initial 



conditions for the true signal are arbitrarily chosen from mean zero normal distributions with 
variances 3 and 5 for the slow and fast processes, respectively. The filtering objective is to 
estimate the slow states using sequentially available observations. We discuss the numerical 
experiments in the following. The numerical experiments were performed using MATLAB v. 
7.11.0.584 (R2010b), without explicitly employing parallel processing capabilities, on an Intel 
Xeon DP Hexacore X5675 3.07 GHz processor with 12x4 Gb RAM. 

5.0.6. Optimized HHPF with linear observations. In the first set of numerical experiments, we 
consider the case of non-sparse observations and consider two variants of the HHPF to study the 
effects of the optimal particle propagation for linear observation functions discussed in Section 
14.2.51 The SIS algorithm is used in both variants of the HHPF. In one variant, the proposal 
density used for the importance sampling procedure is generated using particles propagated 
using the optimal drift and diffusion as given in ()20p . with weights updated accordingly. In the 
other variant, proposal density is the prior density generated by propagating particles directly 
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according to (j25p . We call the first one the optimized HHPF and the second one the direct 
HHPF. For the HMM scheme, the HMM window was set at m^r = 64 microtimesteps, and 
number of microtimesteps skipped to ignore transient effects is niT = 32. The number of 
replicas is set at = 1. 

Numerical experiments were performed using both HHPFs with varying sample sizes starting 
from A'^s = 2 to A'^s = 400. Both filters achieved better accuracy with increasing Ng, as they 
should, and Figures [2] and [3] each show the comparison of one dimension of the true state with its 
estimates using the HHPFs and the corresponding estimation errors. In both upper figures, blue 
curve represents the true state, broken red curve represents the estimate using the optimized 
HHPF, and green curve represents that of the direct HHPF. In the lower figures, blue curve 
represents the observation error, broken red curve and blue curves represent the estimation error 
of the optimized and direct HHPFs, respectively. The error shown is the absolute error over 

all 36 slow dimensions, i.e. error^ = \jY^=i{-^t ~ ■^t'^)'^- Figure [5] shows the HHPFs using 
just 2 particles and the optimized HHPF was seen to perform fairly well in estimating the true 
state, and the error plot showed that the estimate was only slightly worse than the observation 
error. This is not surprising, since all the slow states were observed and the particles were 
driven towards those states as indicated by the observations. Estimation error was contributed 
to in part by the stochasticity in the system and the observation noise, as well as the error due 
to approximation by numerical homogenization. The direct HHPF however fails to estimate 
the truth with Ng = 2. The estimate initially converged to the truth but diverges at around 
t = 2.5 and fails to recapture it after that. Both the optimized and direct HHPFs took just 
5 seconds to run over the entire interval of 320 macrotimesteps, which is a good performance 
for the optimized HHPF. But considering that the estimates were at best only as good as the 
noisy observations, the most computationally efficient choice would be to just use observations 
directly if observations of all states are available. 

By increasing the number of particles used, we see in the error plot of Figure [3] that the 
optimized HHPF was able to provide a better estimate of the truth than what was known 
directly from the observations. The run time for the experiment presented in the figure was 
135 seconds, which is the typical time recorded for the optimized HHPF with Ng = 100. The 
estimate from the direct HHPF improved but was at best as good as the observation with 
Ng = 100. Increasing Ng up to 400 improved both filters' estimates but not significantly for the 
optimized HHPF. The direct HHPF was seen to be able to match the optimized HHPF with 
Ng = 100 by using 600 particles but the run time required was 639 seconds. The second and 
third columns of Table [T] show the run times for different sample sizes for a typical experiment 
using the HHPFs. For each fixed A^^, run times for the optimized and direct HHPFs were 
observed to be within the same range, which was expected, since the only additional function 
evaluations in the optimized version is the calculation of the "nudging" correction that is just a 
linear combination of observation vector with particle locations. Occasional drastic variation in 
run times between the optimized and direct HHPFs, for example for Ng = 100 and Ng = 400, 
can be explained based on the number of times the resampling procedure were required to be 
performed in each filter. In each case, resampling may have been required more frequently due 
to the initial sample being significantly far from the truth, hence more resampling had to be 
performed in the beginning of the run, or at some point in time, a large number of particles 
were forced significantly far from the true state by the stochastic forcing. The second scenario 
is less likely to happen since the signal noise amplitude is small in comparison with that of 
the states. Resampling, however, will not drastically increase for the optimized HHPF because 
particles are simultanesously driven towards the truth based on the observations through the 
optimal propagation procedure. 

The same numerical experiments were also performed for the case of sparse observations (i.e. 
observing only the dimensions with odd index), to study the performance of the optimized 
HHPF in estimating hidden states. The same trend as for the case of non-sparse observations 
was observed in comparing the estimates using the optimized and direct HHPFs. For fixed Ng, 
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Figure 2. Non-sparse observations, optimized and direct HHPFs, Ng = 2. The 
optimized HHPF estimated the truth fairly well but was at best as good as 
observations. Increasing A'^, led to estimates that were better than obervation 
data. 

the optimized HHPF achieved better accuracy than the direct version, although both estimates 
displayed higher estimation errors compared to the non-sparse observation case at the same 
Ng due to the presence of hidden states. For this case, the optimized HHPF estimate was 
able to match the observation data of the observed states using 20 particles, and estimate the 
unobserved states as well. Figure S] shows the comparisons of the estimates from the HHPFs, 
using 100 particles for the optimized and 400 particles for the direct, with the truth. The upper 
plot shows the estimates of an observed state and the lower shows those of an unobserved state. 
The optimized HHPF (broken red curve) provided good estimates of all the states, including 
the unobserved ones and the corresponding error plot in Figure shows that the estimation 
was as good as the observation (of course, observations only observed half the states and the 
error shown only compares error in the observed dimensions; the HHPF estimates included 
the unobserved dimensions, shown to capture the truth in the lower plot in Figure H|). The 
direct HHPF, was also able to capture the shape of the true fluctuations in the observed and 
unobserved dimensions, but in the observed dimensions, the estimates were poorer than the 
observations, even with 400 particles. Increasing sample size further up to A^^ = 800 did not 
lead to significant improvement. The run times and the trend of increase in run time with 
sample size for the sparse observations experiments were the same as that of the non-sparse 
observations experiment shown in Table [TJ 

The homogenized filters comparison experiments showed that the optimized HHPF displayed 
significant estimation performance over the direct version for a fixed sample size. This indirectly 
led to improvement in computation time in comparison with the direct HHPF, in the sense that 
for the same or even better level of estimation error, the optimized HHPF could be implemented 
using smaller sample size than the direct version, hense reducing computational costs. In the 
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Figure 3. Non-sparse observations, optimized and direct HHPFs, Ng = 100. 
The optimized HHPF estimate was better than observation data but the direct 
HHPF was still not as good. The direct HHPF was able to match the optmized 
one with Ns = 600 at the cost of longer computation time. 



discussion here and in Table [H we have only considered the comparisons of the HHPFs with 
Ng up to 400, beyond which the run time advantage of the HHPF over other unhomogenized 
nonlinear filters, for the same level of accuracy, is lost. In the next set of experiments, we 
compare the optimized HHPF with two other nonlinear filters. 

5.0.7. Comparison of the optimized HHPF with other nonlinear filtering schemes. In the second 
set of numerical experiments, we compared the optimized HHPF with two other nonlinear filter- 
ing schemes: the ensemble Kalman filter (enKF) and a particle filter without homogenization. 
The aim was to compare the run time reduction in the HHPF, due to the implementation of a 
homogenization scheme, relative to unhomogenized nonlinear filters and assess the trade-off in 
estimation accuracy due to homogenization. We do not expect the optimized HHPF to outper- 
form an unhomogenized filter in terms of estimation accuracy, but the optimized HHPF should 
possess comparable estimation capabilities to unhomogenized filters, with the advantage of the 
HHPF being that it requires shorter run times. 

Since observations were generated at macrotimesteps of size 2~^, we have observations be- 
ing sparse in time as well for the unhomo genized filters, which use the sam e timestep size of 
2^^^ as the model simulation. The enKF Evensen and van Leeuwen 1996l | was implemented 



directly without modifications, by propagating the ensemble forward in time according to the 
model dynamics and performing information updates at timesteps when observations are avail- 
ab le. For the particle filter, we implement a modified SIS particle filter algorithm developed 
by van LeeuwenI (2010l | that is designed to accommodate for observations that are sparse in time. 



This particle filter is similar to the optimized HHPF presented here, in the sense that it uses 
the presently available observation to construct a better proposal density at a present time by 

20 





t 



Figure 4. Sparse observations, optimized {Ng = 100) and direct (A''^ = 400) 
HHPFs. The upper plot shows the estimates of an observed state, the lower plot 
shows that of an unobserved state. The unobserved state was estimated well by 
the optimzied HHPF with Ns = 100. Even with Ns = 400, The direct HHPF 
captures the fluctuations in the truth but did not fohow the trajectory well. 
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Figure 5. Sparse observations, estimation error of the observed states for the 
optimized {Ng = 100) and direct (Ng = 400) HHPFs compared with observa- 
tion error. The optimized HHPF estimates were as good as observations (but 
the optimized HHPF also provided estimates of the unobserved states so more 
information is gained by using the filter instead of just observation). 



driving particles in between observation timesteps using a time exponential function that is pro- 
portional to the model noise covariance and the distance of the intermediate par ticle locations 
from the observed state. For details and better insight to this particle filter, see Ivan Leeuwen 
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and from here on, we will denote this particle filter as just PF, but it is implied that it 
is the particle filter adapted for sparse-in-time observations. 

Similar numerical experiments as for the comparison of the optimized and direct HHPFs were 
performed for the enKF and the PF and the estimation results were compared with those of 
the HHPFs. We will first discuss the case of non-sparse (spatially) observations, i.e. the case 
H = IrxK- The PF was able to provide good estimates of the truth with a sample size of just 
2. Estimation error decreased as Ng was increased to 20 and was not seen to further decrease 
significantly as Ng was increased from 20. Figure [6] shows the estimates of one dimension of 
the true state using the EnKF, the optimized HHPF, and the PF with Ns = 20, and their 
corresponding estimation errors. The blue curve is for the truth, the black for the EnKF, red 
for the optimized HHPF, and green for the PF. The optimized HHPF displayed the highest 
estimation error, its estimate being as good as the observation. The EnKF and PF estimates 
are better than the observation, with estimation errors of equal magnitude. However, when 
considering the run times, the optimized HHPF took 30 seconds while the EnKF and PF took 
540 and 1757 seconds, respectively. Even with Ng = 2, the PF took 1727 seconds due to the 
timestep size and the functional evaluations required for particle weight calculations, as well as 
the resampling procedures. Additionally, based on the error plot in Figure El the estimation 
error of the optimized HHPF was not much worse than those of the EnKF and PF, and the 
estimate trajectory followed the truth very well apart from slight over- and under-shoots at 
local maxima and minima. This indicates that the optimized HHPF required less run time 
compared to the unhomogenized filters at a comparable level of estimation error. Increasing 
sample size showed that the optimized HHPF was almost as good as the PF at Ng = 100, as 
shown in Figure [71 with the EnKF being slightly better than both particle filters. The key 
point is that using 100 particles, the optimized HHPF took 134 seconds while the EnKF and 
PF took 2169 and 920 seconds, respectively. Considering that the levels of estimation errors 
are almost equal, the optimized HHPF provided significant advantage in terms of computation 
time. Further increasing the sample size to 400 enabled the optimized HHPF to match the PF. 
Beyond A^^ = 400 however, the optimized HHPF lost its computation time advantage, for the 
EnKF could be implemented using Ng = 50 or 100 with about the same level of estimation 
accuracy and computation time. 

Similar numerical experiments were again performed for the case of sparse observations (i.e. 
observing dimensions with odd index) and Figures [8] and [9] show the comparisons of the estimates 
of the observed and unobserved states from the filters using 20 and 100 particles, respectively. 
With 20 particles, the EnKF and PF provided slightly better estimates of the observed states 
than the observation, but the optimized HHPF performed rather poorly in comparison. How- 
ever, at Ng = 100, the optimized HHPF's performance improved significantly, performing as 
well as the PF. Figure [TOl shows the estimation error over the observed dimensions and, to com- 
pare the filter estimate over all slow dimensions. Figure fTTl shows the estimation error over all 
slow dimensions for each filter. The PF estimate converged to the truth faster but the optimized 
HHPF estimate eventually became as good as the PF's, with the EnKF's being the best. The 
key point is again that the optimized HHPF took 134 seconds while the EnKF and PF took 
920 and 2169 seconds respectively. Even if we considered using the EnKF with Ng = 20 (540 
seconds) and the PF with Ng = 2 (1727 seconds), both of which provided relatively low-error 
estimates, the optimized HHPF with Ng = 100 is still faster. So, even in the sparse observa- 
tions case, the optimized HHPF could be implemented in shorter time with estimation error 
comparable or equal to the EnKF and the PF. 

We do not claim that the HHPF is better than the EnKF or the PF; as shown, the unhomog- 
enized filters provided lower estimation error than the homogenized filter at low fixed Ng. As 
mentioned in the discussion of the previous set of experiments, the accuracy of the optimzied 
HHPF could be increased by increasing Ng, but beyond Ng = 400, it loses computation time 
advantage over the EnKF. However, in terms of computation time and cost of storage, the 
optimized HHPF displayed advantage over the unhomogenized filters, with comparable level of 
accuracy. The EnKF would have been the better choice of filter if the lowest possible estimation 
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Figure 6. Non-sparse observations, EnKF, PF and optimized HHPF compar- 
ison at fixed Ng = 20. The estimate of the optimized HHPF is comparable to 
those of the EnKF and the PF, but is obtained in a shorter run time. 



error was required over computational time. Additionally, the P F w as design ed to accommo- 
date temporally sparse observations, as it has been shown to do in Ivan Leeuwen 2010 ] and here. 
This capability still needs to be incorporated in the HHPF as presented here, because in most 
real time applications, the time interval for availability of observation data can be greater than 
the 45 minute-interval assumed here. 

In comparing the computational times of the numerical experiments using the EnKF, the 
PF and the optimized HHPF so far, we have not taken into account the cost of computing 
the homogenized observation function H. In addition to H being a constant matrix, we also 
assumed that the sensor observed only the slow states, hence the observation function was 
independent of the fast process. So, the homogenized observation function is the same as 
the unhomogenized one. However, we do not expect the cost of evaluating the homogenized 
observation function to drastically affect the computational time advantage of the optimized 
HHPF over other unhomogenized nonlinear filters. 



6. Conclusion and future directions 

6.0.8. Future directions. In chaotic systems, such as the one studied in this paper, the transients 
become irrelevant from the dynamical systems point of view and the motion of the solution 
settles typically near a subset of the state space, called an attractor. However, in the data 
assimilation problem that is of interest in this paper, we are interested in the transients and, in 
particular, in directions that are stretched by the transient dynamics. This sensitivity to initial 
conditions are characterized by the finite time Lyapunov exponents, which are determined by 
the behavior of two neighboring orbits or the two point motion of the nonlinear systems. 
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Figure 7. Non-sparse observations, estimation error comparison the EnKF, 
PF and optimized HHPF comaprison at fixed Ng = 100. The optimized HHPF 
could match the PF, withthe EnKF being shghtly better, but the optimized 
HHPF required the least computation time. 



Ns 


Opt. HHPF 


Direct HHPF 


PF 


EnKF 


2 


5 


5 


1727 


N/A 


10 


16 


15 


1716 


N/A 


20 


30 


20 




(540) 


50 


67 


46 


1936 


651 


100 


@ 


86 


2169 


920 


200 


177 


176 


3002 


1415 


400 


401 


539 


3591 


3152 


Table 1. Typical computation times (in sec.) for different samp 


e sizes for the 



different nonlinear filters, in the case of non-sparse observations. We see that 
for a fixed Ng, the HHPF required less computational time. Circled are the 
computation times corresponding to the sample sizes that led to the same levels 
of estimation accuracy for the three different filters compared in Section 15.0.71 
For the same level of accuracy, the optimized HHPF required a larger sample 
size but still performed in less time. Times for the case of sparse observations 
are of the same magnitudes and display the same trend. 



We are mostly interested in filtering deterministic chaotic systems, and the particle filtering 
methods developed above won't work without the addition of noise, because knowing the initial 
conditions Xq, the distribution of Xt is a Dirac measure. Therefore we add Gaussian noise 
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Figure 8. Sparse observations, EnKF, PF and optimized HHPF comparison at 
fixed Ng = 20. The optimized HHPF did not perform as well as the EnKF or 
PF. 



artificially. Hence, the finite time Lyapunov exponents may be handy in deciding how we 
choose the magnitude of the noise. 

The second question then arises as to the property of the sensor function, in the linea r case . 



199^ 1 



the span of the observ ation matrix. As pointed out by Lorenz iLorenz and Emanuell 
and Palmer et al.l 1998l |. who have coined the word "adaptive" or "targeted" observations, the 
sensors should be deployed at any given time, if the data that they gather are to be most 
effective in improving the analysis and forecasts. 

The results presented in this paper assume that the observations are available every 45 min- 
utes. The extension of this work that deals with every combination of the spatial and temporal 
sparsities, performing i ntermittent in time s parse data assimilation that mimics a global weather 



mode l is presented in iLingala et al.l |2012l ] . The particle method presented in iLingala et al. 



2012l | consists of control terms in the "prognostic" equations, that nudge the particles toward 



the observations, specially in the sparsest situation of observations in every 48 hours, and 
shows that control methods can be used as a basic and flexible tool for the construction of the 
proposal density inherent in particle filtering. 
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Figure 9. Sparse observations, EnKF, PF and optimized HHPF comparison 
at fixed Ng = 100. The optimized HHPF performed as well as the PF, but in 
shorter time than both the EnKF and PF. 
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